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FOREWORD 


The  INPUT  QUALITY  Task  conducts  a  continuing  rosoarch  program  on  scrooning  and  induction 
techniques.  Objoctivos  aro  (1)  to  improvo  tho  systom  for  scrooning  potential  onlistod  input  so  as  to 
identify  and  reject  more  effectively  those  who  are  not  readily  trainable  and  usable  in  the  service; 

(2)  to  aid  in  manpower  planning  by  developing  methods  for  estimating  the  mental  abilities  of  the 
civilian  pool  available  for  service  under  various  conditions;  and  (3)  to  develop  technical  informa¬ 
tion  for  use  in  consultative  assistance  to  staff  agencies  responsible  for  procurement  and  standards 
policies. 


As  one  avenue  to  the  development  of  technical  information  and  methods  for  improving  input 
screening,  the  potential  contribution  of  programmed  testing  has  been  explored.  In  this  connection, 
experimental  branching  tests  were  developed.  The  present  publication  reports  cn  the  trial  adminis¬ 
tration  of  two  branching  tests  by  means  of  a  computerized  system  with  teletypewriter  input/output 
developed  by  the  National  Bureau  of  Standards.  Research  was  a  part  of  Subtask  i,  "An  exploratory 
study  of  branching  tests."  The  entire  INPUT  QUALITY  Task  is  responsive  to  special  requirements 
of  the  Deputy  Chief  of  Staff  for  Personnel,  as  well  as  to  requirements  to  contribute  to  achievement  of 
the  objectives  of  RDT&E  Project  2J024701 A722,  "Selection  and  Behavioral  Evaluation,"  FY  1967 
Work  Program. 


J.  E/OHLANER,  Director 
Behavioral  Science 
Research  Laboratory 


AN  EXPLORATORY  STUDY  OF  BRANCHING  TESTS 


BRIEF 


REQUIREMENT: 

To  oxploro  tho  comparability  of  computerized  branching  tests  and  conventional  paper-and-pencil 
tests  with  respect  to  reliability,  information  conveyed  by  the  test  score,  and  rationale  of  test 
construction. 

PROCEDURE: 

Two  specially  constructed  8-9-item  branching  tests  (verbal  and  arithmetic  reasoning)  and  the 
corresponding  conventional  40-  and  50-item  tests  of  the  Army  Classification  Battery  were  adminis¬ 
tered  to  a  sample  of  102  enlisted  men.  Scores  were  analyzed  to  estimate  reliability  and  to  study 
relationships  between  corresponding  branching  and  conventional  tests. 


FINDINGS: 

The  short  branching  tests  were  substantially  correlated  with  counterpart  longer  conventional 
tests  (r  *  .83  and  .79,  higher  than  would  be  expected  with  equally  short  conventional  tests).  Classi¬ 
cal  test  theory  developed  for  the  construction  of  linear  tests  is  not  entirely  appropriate  in  developing 
branching  tests.  For  example,  item  difficulty  indexes  based  on  population  performance  are  not  fully 
appropriate  when  each  examinee  is  tested  with  questions  geared  to  his  own  ability  level. 


UTILIZATION  OF  FINDINGS: 

The  exploratory  study  reinforced  the  promise  of  branching  tests  and  pointed  to  the  need  for 
reexamination  of  test  theory. 
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AN  EXPLORATORY  STUDY  OF  BRANCHING  TESTS 


In  the  past  several  years,  the  U.  S.  Army  Behavioral  Science  Research 
Laboratory  (BESRL)  has  been  Interested  In  new  approaches  to  testing  which 
might  prove  to  be  Improvements  over  conventional  methods.  One  line  of 
Interest  has  been  the  branching  technique.  Branching  Is  provided  by  pro¬ 
gramming  a  test  so  that  an  examinee  who  answers  a  test  Item  correctly  Is 
presented  next  with  a  more  difficult  Item,  and  an  examinee  who  answers 
Incorrectly  Is  presented  with  am  easier  Item.  By  contrast,  the  conventional 
test  Is  linearly  programmed  so  that  all  examinees  answer  the  same  Items 
regardless  of  the  correctness  of  their  responses.  The  branching 
has  the  ^ucutial  of  reducing  error  In  test  scores  or  of  providing  scores 
of  validity  equal  to  that  of  the  linear  test— but  with  fewer  Items.  A 
preliminary  study*-'  had  Indicated  the  theoretical  premise  of  branching 
tests.  However,  the  promise  could  not  be  followed  up  In  the  format  of  the 
conventional  printed  test*-',  and  automated  methods  were  required. 

BESRL  contracted  with  the  National  Bureau  of  Standards  (NBS)  to  conduct 
a  preliminary  design  study  of  a  programmed  testing  machine  which  would  meet 
a  number  of  specified  requirements,  Including  the  requirements  for  branching*^ 
Following  couplet  Ion  of  the  design  study,  NBS,  out  of  Its  own  Interest  In 
the  technique,  developed  a  computer  system  with  teletypewriter  input/output 
which  was  programmed  to  provide  branching  but  not  to  meet  the  other  require¬ 
ments  covered  In  the  design  study.  NBS  Invited  BESRL  to  use  the  system  for 
exploratory  research.  The  Impending  move  of  NBS  to  a  new  location  made  it 
necessary  to  act  quickly. 

Accordingly,  two  branching  tests,  one  of  verbal  ability  and  the  other 
of  arithmetic  reasoning,  were  assembled  from  Item  data  readily  available. 
These  tests  and  counterpart  conventional  tests  were  administered  to  a  group 
of  enlisted  men  and  the  results  were  compared.  The  objective  was  to  obtain 
Indications  of  the  research  promise  of  the  branching  technique  and  to  uncover 
some  of  the  major  problems  likely  to  be  encountered  In  a  systematic  study 
of  the  branching  technique,  as  well  as  to  provide  NBS  with  a  use  test  of 
Its  computer  system. 


Waters ,  Carrie  Jean.  Preliminary  evaluation  of  simulated  branching  test. 
BESRL  Technical  Research  Note  140.  June  1964. 

^Seeley,  L.  C.,  Morton,  Mary  A.,  and  Anderson,  Alan  A.  Exploratory  study 
of  a  sequential  Item  test.  BESRL  Technical  Research  Note  129.  December 
1962. 

^Bayroff,  A.  G.  Feasibility  of  a  prograssaed  testing  machine.  BESRL 
Research  Study  64-3.  November  1964. 


Tha  Branching  T«»t* 


I  tens  for  the  branching  tests  were  selected  from  the  experimental 
farms  of  the  Armed  Forces  Qualification  Test,  AFQT  7-8  and  AFQT  5-6* 
Selection  was  mainly  of  items  not  Included  In  the  operational  forms  of 
AFQT*  Bach  test  plan  (Figure  l)  called  for  a  pool  of  Item  with  a  dif¬ 
ficulty  range  of  p  ■  .95  to  .25,  beginning  with  an  Item  of  p  *  .60.  All 
examinees  were  to  answer  8  Items  with  difficulty  differences  of  p  ■  .05. 
Examinees  who  reached  the  most  difficult  Item  (p  -  .25)  and  answered  It 
correctly  were  to  be  presented  with  an  additional  Item  of  greater  difficulty 
(p  »  .20)  as  a  means  of  Increasing  the  celling.  The  items  were  selected 
to  meet  this  plan  as  closely  as  possible.  The  four-choice  Items  were 
modified  by  adding  two  Incorrect  choices  as  a  means  of  reducing  chance 
success. 

The  scare  was  determined  by  the  relative  difficulty  of  the  Item 
reached  In  the  final  stage.  This  stage  had  a  difficulty  range  of  p  «  .95 
to  .20  and  provided  a  scale  with  a  raw  score  range  of  1  to  17*  Bach  of 
these  final  Items  had  two  score  values— a  score  for  answering  the  Item 
Incorrectly,  and  the  score  Increased  by  1  for  answering  the  Item  correctly. 


The  Conventional  Linear  Teets 

The  two  conventional  linear  tests  administered  were  the  Verbal  Test 
(VE-2B)  and  the  Arithmetic  Reasoning  Test  (AR-LB)  of  the  Army  Classification 
Battery.  These  tests  are  power  tests  of  50  items  and  40  Items,  respectively, 
each  Item  having  four  alternatives.  Total  scores  were  corrected  for  chance 
success. 


PROCEDURE 

The  two  branching  tests  and  the  two  linear  tests  were  administered  to 
all  examinees  In  counterbalanced  order,  half  taking  the  two  branching  tests 
first  and  half  taking  the  linear  tests  first.  The  examinees  were  102  enlisted 
men  from  Fort  Belvoir,  Virginia,  with  a  wide  range  of  scores  on  the  General 
Technical  Aptitude  Area,  a  composite  of  VE  and  AR.  No  particular  sampling 
design  was  attempted.  Examinees  were  told  they  were  taking  part  In  an 
experiment. 

The  linear  tests  were  administered  to  groups  of  about  25  men.  The 
branching  tests  were  administered  to  one  examinee  at  a  time.  The  teletype¬ 
writer  typed  out  the  branching  test  Item,  and  the  examinee  responded  by 
pressing  a  typewriter  key  appropriate  to  the  alternative  selected.  So 
long  as  the  Item  was  on  display,  the  examinee  could  change  his  answer  by 
pressing  the  key  for  another  alternative.  A  "Record"  key  entered  the  last 
alternative  selected  as  the  answer.  The  computer  scored  omitted  Items  as 
wrong;  hence,  examinees  were  Instructed  to  guess  If  necessary.  Examinees 
were  also  Instructed  to  guess  on  the  linear  tests.  Examinees  were  not 
informed  of  the  nature  of  the  branching  tests. 
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Figure  1.  Branching  tait  plan 


RESULTS  AND  DISCUSSION 


A  correlation  coefficient  was  computed  between  each  of  the  branching 
teste  and  its  counterpart  linear  test.  The  two  orders  of  testing  were 
coribined.  Correlation  between  the  two  verbal  ability  tests  was  r  ■  .78; 
between  the  two  arithmetic  reasoning  tests,  r  ■  .74.  When  corrected  to 
the  mobilization  population,  using  the  GT  aptitude  area  as  the  selector, 
correlation  was  increased  to  r  ■  .83  for  VE  and  .79  for  AR  (Table  l). 


Table  1 

CQRRilATION  OF  BRANCHING  TFSTS  WITH  CONVENTIONAL  LINEAR  TESTS 


Tests 

Ho.  of 

Items  M 

S.D. 

Correlation  Coefficient® 

Branching 

Raw  Scores 

Branching 

VE  AR 

VE 

Linear 

AR 

VE 

8-9 

10.6 

2.8 

.57 

.78 

.64 

AR 

8-9 

9.9 

3.9 

.50 

.74 

Linear 

Army  Standard  Scores 

VE 

50 

105.9 

19.1 

.83 

•  91b 

.65 

AR 

40 

105.6 

17.7 

.79 

.85b 

*Co*fflet*atB  above  dlaional  uncorrvotod  for  vvlvcflon}  eorlflcl.nl.  below  diagonal  corrected  to  moblllretlon  population. 
kTeet-cvtoat  ro liability  In  t  obllleetlon  population. 


The  test-retest  reliability  estimates  of  the  ACB  tests,  as  recently 
determined  in  carefully  constructed  samples  of  enlisted  men  and  corrected 
to  the  mobilization  population,  are  r  «  .91  for  VE  and  r  =  .85  for  AR. 

These  coefficients  represent  the  maximum  correlation  that  could  practically 
be  expected  between  the  8-  or  9-Item  branching  tests  and  the  40-  and  50-item 
linear  tests  of  the  ACB. 

To  provide  a  frame  of  reference  for  these  data,  two  estimates  were 
made:  (l)  the  length  a  linear  test  would  have  to  be  in  order  to  be  as 
reliable  as  the  8-item  branching  test,  and  (2)  the  correlation  between 
8-item  linear  tests  and  their  counterpart  long  linear  tests 


^  The  following  analyses  were  contributed  by  Dr.  John  J.  Mellinger,  Chief , 
Statistical  Research  and  Consultation  Branch,  Statistical  Research 
and  Analysis  Division,  BESRL. 
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The  reliability  coefficients  of  the  branching  tests  were  based  on  the 
reliability  indexes  of  the  two  long  linear  tests  (.95  for  VE,  .92  for  AR) 
obtained  from  the  test-retest  reliability  coefficients.  To  obtain  the 
reliability  indexes  of  the  8-item  branching  tests,  the  coefficients  of 
correlation  between  the  branching  tests  and  their  long  linear  counterparts 
(.83  for  VE,  .79  for  AR)  were  divided  by  the  reliability  indexes  of  the 
long  linear  tests.  From  these  indexes  (.87  for  VE,  .86  for  AR)  the 
reliability  coefficients  of  the  branching  tests  were  computed  ( .  76  for 
VE,  .73  for  AR).  The  Spearman  -Brown  formula  indicated  that  a  linear  VE 
test  with  a  reliability  coefficient  of  .76  would  require  l6  items;  a 
linear  AR  test  with  a  reliability  coefficient  of  .73  would  require  19 
items  in  contrast  to  the  branching  tests  of  8-9  items. 

Estimates  of  the  correlation  between  8-ltexn  linear  tests  and  the 
longer  linear  tests  were  derived  from  their  respective  reliability 
Indexes.  On  this  basis,  it  was  estimated  that  the  correlation  between 
the  8-item  linear  tests  and  the  40-  and  50-item  tests  was  r  *  .75  for  VE 
and  r  =  .67  for  AR.  Comparable  correlation  coefficients  for  the  branching 
tests  were  .83  for  VE  and  .79  for  AR. 

The  correlation  coefficient  (uncorrected  for  selection)  between  the 
branching  VE  test  and  the  linear  AR  test  (r  =  .64)  was  the  same  as  the 
uncorrected  correlation  coefficient  between  the  linear  VE  test  and  the 
linear  AR  test  (r  =  .65).  The  comparable  coefficient  between  the  branch¬ 
ing  AR  test  and  the  linear  VE  test  was  lower  (r  =  .50),  presumably  a 
function  of  the  marked  skew  in  the  branching  AR  test  distribution,  as 
described  below. 

The  distributions  of  the  linear  tests  and  of  the  branching  VE  test 
were  approximately  normal.  However,  the  distribution  of  the  branching 
AR  test  departed  markedly  from  normality,  with  12  of  the  102  examinees 
obtaining  the  maximum  score.  These  12  examinees  had  Army  standard  scores 
of  107  to  l4l  on  the  linear  AR  test,  indicating  that  a  higher  ceiling  for 
the  branching  test  might  have  resulted  in  higher  correlation  with  the  con¬ 
ventional  test.  The  behavior  of  the  AR  items  in  the  only  sequence  possible 
for  those  obtaining  the  maximum  score  (correct  answers  for  items  1,  2,  4, 

7,  11,  16,  22,  29,  37)  was  examined.  These  items  differed  by  successive 
decrements  of  p  =  .05.  approximately.  Beginning  with  the  fourth  item  in 
this  sequence  (item  7)f  all  the  succeeding  items  were  answered  correctly 
by  most  of  the  examinees  who  attempted  them— iu  spite  of  the  range  of 
difficulty  (p  =  .45  to  .20).  For  these  examinees,  the  items  were  apparently 
equal  in  difficulty. 

The  apparently  lesser  difficulty  of  the  items  in  the  AR  branching 
test  raises  a  q  estion  concerning  the  index  of  difficulty.  The  conventional 
p -value  is  a  population  value— the  proportion  of  a  population  that  answers 
an  item  correctly.  It  does  not  indicate  the  proportion  of  a  particular 
level  of  ability  that  answers  correctly.  An  item  that  is  considered  dif¬ 
ficult  because  of  its  low  p-value  is  not  necessarily  equal  in  difficulty 
for  all  levels  of  ability.  Items  which  differ  in  p-value  for  the  entire 
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population  may,  in  fact,  be  equal  in  the  proportion  of  higher  levels  of 
the  population  which  answer  correctly;  conversely,  items  which  are  equal 
in  p-value  for  the  entire  population  may  differ  in  the  proportion  of  the 
lower  levels  of  the  population  which  answer  correctly.  The  logic  of  the 
branching  program  does  not  appear  compatible  with  population  indexes  of 
difficulty.  After  the  initial  item,  the  difficulty  Indexes  must  be 
related  to  ability  level,  uncontaminated  by  the  contributions  of  the  rest 
of  the  population. 

Another  problem  concerns  the  homogeneity  of  items.  The  problem  is 
not  unique  to  branching  tests,  of  course,  but  is  emphasized.  The  small 
number  of  items  that  are  answered  and  the  variation  in  the  particular 
items  that  are  responded  to  by  different  examinees  make  it  more  difficult 
to  sample  only  the  common  content  them  in  the  linear  tests  with  their  larger 
nunibers  of  items,  all  of  which  are  responded  to  by  all  the  examinees. 
Moreover,  as  with  the  computation  of  difficulty  Indexes,  homogeneity 
Indexes  such  as  item-test  correlation  coefficients  need  to  differentiate 
the  effect  of  examinees  who  are  presented  with  particular  Items  in  the 
branching  test  and  those  who  are  not. 

Since  it  is  possible  to  arrive  at  a  terminal  item  by  a  variety  of 
pathways  or  item  sequences — except  for  the  easiest  and  most  difficult 
terminal  items— it  was  of  interest  to  determine  if,  in  fact,  such  variety 
did  occur.  Accordingly,  the  pathways  taken  by  each  examinee  were  tabulated 
and  grouped  according  to  teimlnal  item.  In  most  Instances,  as  Table  2 
indicates,  a  variety  of  pathways,  differing  in  the  average  p-value  of  the 
items,  were  taken  to  arrive  at  the  same  terminal  item.  This  finding  has 
several  implications,  in  addition,  of  course,  to  the  possibility  that 
these  are  chance  variations:  (l)  Item  p-values  as  indexes  of  difficulty 
are  relatively  imprecise— not  a  new  finding.  (2)  The  branching  program 
permits  the  individual  to  respond  according  to  item  difficulty  for  himself, 
which  is  different  from  the  difficulty  represented  by  population  p-values. 
(5)  Variation  in  pathway  may  be  a  significant  parameter  of  branching  tests, 
and  if  incorporated  in  the  scoring  may  contribute  to  more  effective  dis¬ 
crimination. 


General  Observations 


The  Tflschine  mode  of  administering  the  branching  tests  apparently 
aroused  great  Interest  in  the  examinees.  The  occasional  difficulties 
with  the  equipment  were  promptly  dealt  with  and  did  not  appear  to  intro¬ 
duce  error  into  the  scores  nor  adversely  affect  motivation.  All  the 
examinees  had  taken  the  same  or  similar  linear  tests  within  the  past  few 
months.  However,  it  is  not  possible  to  tell  bow  much  change  in  scores 
occurred,  since  the  original  scores  were  not  available. 
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Table  2 


NUMBER  OF  DIFFERENT  PATHWAYS  TO  EACH  TERMINAL  ITEM  ON  BRANCHING  TESTS 


Score 

Terminal 

Item 

f 

Pathways  Possible 

Pathways  Taken 

VE 

AR 

VE 

AR 

1 

3 6 

0 

0 

1 

0 

0 

2 

36 

0 

2 

1 

0 

1 

3 

35 

1 

2 

7 

1 

2 

4 

35 

1 

3 

7 

1 

2 

5 

34 

0 

6 

21 

0 

5 

6 

54 

4 

6 

21 

4 

6 

7 

33 

5 

12 

35 

4 

12 

8 

33 

18 

7 

35 

13 

6 

9 

32 

7 

11 

35 

6 

8 

10 

32 

20 

13 

35 

12 

10 

n 

31 

5 

9 

21 

4 

7 

12 

31 

6 

5 

21 

5 

4 

13 

30 

16 

4 

7 

5 

3 

14 

30 

10 

9 

7 

5 

6 

15 

29 

8 

0 

1 

1 

0 

16 

37 

0 

1 

1 

0 

1 

17 

37 

l 

12 

1 

1 

1 

Totals 

102 

102 

256 

62 

74 
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Print -outs  of  responses  to  the  branching  tests  indicated  that  practi¬ 
cally  no  examinee  had  changed  an  answer,  although  the  computer  system  per¬ 
mitted  changes  to  be  made  while  the  item  was  displayed.  It  is  not  clear 
whether  machine  presentation  increased  the  confidence  of  the  examinees  or 
whether  they  were  wore  Interested  in  the  machine  operation  than  in  their 
test  scores,  especially  since  they  knew  they  were  taking  part  in  an 
experiment . 


The  computer  system  used  with  the  branching  program  displayed  one 
item  at  a  time.  The  examinee  could  see  the  print-out  of  the  preceding 
items  and  his  responses  but  could  not  change  the  responses.  The  printed 
format,  of  course,  makes  many  Items  available  at  a  time  and  does  permit 
the  examinee  to  vary  his  order  of  responding  and  to  change  his  responses 
to  preceding  items.  Furthermore,  the  computer  system  all  but  precluded 
omissions,  whereas  omissions  did  occur  on  the  conventional  tests.  The 
extent  to  which  these  incidental  differences  affected  the  correlation  is 
not  known. 


SUMMARY  OF  FINDINGS  AND  CONCLUSIONS 

Correlation  between  the  short  branching  tests  and  their  counterpart 
long  linear  tests  was  substantial  (r  =  .83  and  .79,  corrected  for  restric¬ 
tion  in  range).  Coefficients  were  considerably  higher  than  would  be 
expected  if  equally  short  linear  tests  were  substituted  for  the  branching 
tests  (r  =  .75  and  .67),  and  approached  the  test-reteBt  reliability  of  the 
long  linear  tests  (r  =  .91  and  .85).  Linear  tests  to  be  as  reliable  as 
the  branching  tests  would  have  to  be  and  ksk  times  as  long. 

2  y  2.0 

Classical  test  theory  from  which  the  linear  model  is  derived  does  not 
appear  completely  helpful  in  understanding  the  branching  model.  The  two 
models,  while  they  appear  to  treat  test  items  as  independent  samples  of 
ability,  differ  in  other  respects.  The  linear  model  requires  all  examinees 
to  respond  to  the  same  set  of  items;  the  branching  model  presents  different 
items  to  examinees  of  different  ability  levels.  In  the  linear  model,  the 
items  presented  are  unrelated  to  the  preceding  responses;  in  the  branching 
model,  the  items  presented  are  determined  by  the  preceding  responses.  In 
the  linear  model,  item  statistics  are  based  on  the  performance  of  the  popu¬ 
lation;  in  the  branching  model,  performance  by  ability  level  must  be  con¬ 
sidered.  In  the  linear  model,  the  method  of  limiting  the  score  to  reflect 
correct  knowledge  is  a  statistical  correction  for  chance  success;  in  the 
branching  model,  the  method  reduces  the  opportunity  for  chance  success. 

The  linear  score  is  based  on  the  number  of  items  answered  correctly;  the 
branching  score,  on  the  relative  difficulty  of  the  last  item.  In  sum, 
the  branching  model  resembles  the  psychophysical  concept  of  the  limen  to 
which  the  classical  additive  theory  seems  only  partially  applicable.  If 
the  results  of  the  present  study  are  substantiated,  new  developments  in 
test  theory  seem  necessary. 


-  8  - 


The  net  results  of  this  exploratory  study  indicated  the  definite 
research  promise  of  the  branching  program  for  tests.  The  problems  uncovered 
do  not  seem  insuperable.  Research  directed  toward  such  problems  as  effec¬ 
tiveness  of  branching  variants,  determination  of  optimum  test  length,  size 
of  difficulty  interval,  contribution  of  other  item  and  score  parameters, 
and  generalizability  to  other  content  areas  should  prove  profitable  and  may 
lead  to  the  eventual  development  of  branching  tests  for  operational  use. 

The  immediate  need  is  for  equipment  designed  specifically  for  experimenta¬ 
tion  with  the  branching  technique. 


-  9  - 
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